Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

Khattab, Omar; Potts, Christopher; Zaharia, Matei

Computer Science > Computation and Language

arXiv:2101.00436 (cs)

[Submitted on 2 Jan 2021 (v1), last revised 10 Jul 2022 (this version, v3)]

Title:Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

Authors:Omar Khattab, Christopher Potts, Matei Zaharia

View PDF

Abstract:Multi-hop reasoning (i.e., reasoning across two or more documents) is a key ingredient for NLP models that leverage large corpora to exhibit broad knowledge. To retrieve evidence passages, multi-hop models must contend with a fast-growing search space across the hops, represent complex queries that combine multiple information needs, and resolve ambiguity about the best order in which to hop between training passages. We tackle these problems via Baleen, a system that improves the accuracy of multi-hop retrieval while learning robustly from weak training signals in the many-hop setting. To tame the search space, we propose condensed retrieval, a pipeline that summarizes the retrieved passages after each hop into a single compact context. To model complex queries, we introduce a focused late interaction retriever that allows different parts of the same query representation to match disparate relevant passages. Lastly, to infer the hopping dependencies among unordered training passages, we devise latent hop ordering, a weak-supervision strategy in which the trained retriever itself selects the sequence of hops. We evaluate Baleen on retrieval for two-hop question answering and many-hop claim verification, establishing state-of-the-art performance.

Comments:	NeurIPS 2021 (Spotlight)
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2101.00436 [cs.CL]
	(or arXiv:2101.00436v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2101.00436

Submission history

From: Omar Khattab [view email]
[v1] Sat, 2 Jan 2021 11:52:20 UTC (257 KB)
[v2] Sun, 18 Apr 2021 09:56:09 UTC (724 KB)
[v3] Sun, 10 Jul 2022 17:40:32 UTC (1,016 KB)

Computer Science > Computation and Language

Title:Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators